The Annals of Statistics 

2009, Vol. 37, No. 5A, 2351-2376 

DOI: 10.1214/08-AOS657 

© Institute of Mathematical Statistics, 2009 



ASYMPTOTIC THEORY FOR THE SEMIPARAMETRIC 
ACCELERATED FAILURE TIME MODEL WITH MISSING DATA 

By Bin Nan/ John D. Kalbfleisch and Menggang Yu 

University of Michigan, University of Michigan and Indiana University 

We consider a class of doubly weighted rank-based estimating 
methods for the transformation (or accelerated failure time) model 
with missing data as arise, for example, in case-cohort studies. The 
weights considered may not be predictable as required in a martin- 
gale stochastic process formulation. We treat the general problem as 
a semiparametric estimating equation problem and provide proofs 
of asymptotic properties for the weighted estimators, with either 
true weights or estimated weights, by using empirical process the- 
ory where martingale theory may fail. Simulations show that the 
outcome-dependent weighted method works well for finite samples 
in case-cohort studies and improves efficiency compared to methods 
based on predictable weights. Further, it is seen that the method 
is even more efficient when estimated weights are used, as is com- 
monly the case in the missing data literature. The Gehan censored 
data Wilcoxon weights are found to be surprisingly efficient in a wide 
class of problems. 

1. Introduction. Instead of modeling the hazard function for censored 
survival data, as in the Cox model [6], modeling the (transformed) failure 
time directly is sometimes appealing to practitioners since it postulates a 
simple relationship between the response variable and covariates with easily 
interpretable parameters. Let T denote the failure time transformed by a 
known monotone function h, C be the corresponding transformed censoring 
time, A = 1(T < C) and Y = min(T, C), where !(•) denotes an indicator 
function. The model of interest is 

(1.1) Ti = + ei, i = l,...,n, 



Received January 2008; revised July 2008. 

1 Supported in part by NSF Grant DMS-07-06700. 

AMS 2000 subject classifications. Primary 62E20, 62N01; secondary 62D05. 

Key words and phrases. Accelerated failure time model, case-cohort study, censored 
linear regression, Donsker class, empirical processes, Glivenko-Cantelli class, pseudo Z- 
estimator, nonpredictable weights, rank estimating equation, semiparametric method. 

This is an electronic reprint of the original article published by the 
Institute of Mathematical Statistics in The Annals of Statistics, 
2009, Vol. 37, No. 5A, 2351-2376. This reprint differs from the original in 
pagination and typographic detail. 



1 



2 



B. NAN, J. D. KALBFLEISCH AND M. YU 



where the e^'s are independent and identically distributed (i.i.d.) with un- 
known distribution F, and is independent of {Zi,Ci) for all i. When 
h = log, the model is called the accelerated failure time model (see, e.g., 
[12]). 

For a cohort of n i.i.d. observations of Xi = (1^, Aj, Zj), i = 1, . . . ,n, [4] 
proposed an imputation type of least squares method, where the censored 
survival time is replaced by an estimate of the mean residual life conditional 
on the covariates, which is obtained from the Kaplan-Meier estimator on the 
residual scale. Stute [24, 25] proposed a weighted least squares method with 
weights obtained from the Kaplan-Meier estimator for the transformed sur- 
vival time. [21, 26] and [30], among others, studied the rank-based estimating 
method and proved the asymptotic properties using martingale theory for 
counting processes. 

In this article, we consider a general rank-based estimating method for 
model (1.1) in the presence of missing data as arise, for example, in case- 
cohort studies (e.g., [19, 23]) where data are missing by design. Specifically, 
let Zi = (Z(j, Zgj)' and assume that Zu is missing at random (see [14]), while 
Z2i, Yi and Aj are always observed for all i. The situations where Zi = Zu 
for all i, or where Z2i is not included in model (1.1), are special cases. In 
the latter of these special cases, Z2i is usually called an auxiliary variable in 
the missing data literature. The approach in this article extends the work of 
[16] for case-cohort studies, where weights are predictable and the counting 
process approach of [26] applies. It can be applied to general two-phase 
outcome-dependent sampling designs for censored survival data and allows 
the use of nonpredictable weights that can yield more efficient parameter 
estimates. The proof of efficiency gains from using estimated weights, even 
though the true weights are given, similarly follows the approach of [18]. 

This article is organized as follows. In Section 2, we introduce the doubly 
weighted rank-based estimating method with arbitrary weights (i.e., either 
predictable or nonpredictable), and link the proposed estimating function 
to a semiparametric framework that is more suitable for applying empiri- 
cal process theory. Methods based on both known weights and estimated 
weights are considered. We describe asymptotic properties of the proposed 
estimators in Section 3, with detailed proofs given in Section 6. In Section 4, 
we discuss the asymptotic efficiency and some simulation results that com- 
pare methods of using predictable weights and nonpredictable weights and 
methods of using known weights and estimated weights. We make a few 
concluding remarks in Section 5. 

2. Doubly weighted semiparametric estimating function. For the ith 

subject, Z2i,Yi and Aj are always observed. Let Ri be the missing data 
indicator that takes value 1 if Zu is also observed and otherwise. Suppose 
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that Z\i is missing at random, so that 

for each i. This holds, for example, when independent Bernoulli sampling is 
implemented in a two-phase sampling design that includes the case-cohort 
study special case. 

To estimate in model (1.1), we follow [15] and define the following 
random map 

1 ^ 

^n{e,ii,p) = -Y,^{Xi-e,,^,p) 

n ^ 

1 

= - V vtip{Y, - e'Zi,e){Zi - 7j{Yi - 0'Zi, e)}Ai, 

n ^ 

1=1 

where ^ € O C is the d-dimensional Euclidean parameter of interest with 
unknown true value and ij and p are real valued (vectors of) functions 
that can be viewed as infinite dimensional nuisance parameters. 

When rj{t,6) is replaced by an estimator of the true function (see [21]) 

E{l{Y-e'Z>t)Z} 
Vo(t,V)- E{l{Y-e'Z>t)} ' 

with r/o(t,^o) = E(Z\Y — 9'qZ > t), random map (2.1) becomes a weighted 
estimating function for 6, where are subject specific weights and p{t, 9) 
is a weight function. Clearly such an estimating function is semiparametric. 

To be more general, we assume that the true functional forms of rj and p 
are unknown and need to be estimated, and study the estimating function 

1 Vni Pn) with 

(2.2) fin{t,9)- ^'-' ' ^ ' '-^^ 



j:]=iW,liYj-9'Z,>t) ' 

where Wj are subject specific weights that may or may not equal Qj. This is 
the source of the term "double weights" (see [31]); the purpose of introduc- 
ing two possibly different subject specific weights will soon become clear. 
A particularly interesting weight function p{t,9) is taken to be pQ{t,9) = 
Pr(y — 9'Z > t), and it can be estimated by 

. ^ El-iWA{Yj -9'Zj>t) 
(2.3) p^{t,9) = ^'-' ' \ \ ' - \ 

a weighted Gehan-type weight. This type of weight provides a very desirable 
property. The corresponding estimating function ^n{d-,f]n-,Pn) is monotone 
in 9. See [31] for the detailed derivation. 
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In this article, we focus on the estimator of 9 obtained from the estimating 
function ^ni9,flm Pn), where fjn is given in (2.2). The estimator /)„ can be 
more flexible, but we will be particularly interested in the one given by 
(2.3). Using two possibly different sets of subject specific weights Qi and Wi 
in ^n{9, fjn, Pn) yields great flexibility that covers a broad range of problems. 
The following are a few examples: 

(i) When p = l and Jlj = Wj = 1 for alH, (2.2) becomes 

. ,,_ j:um-e'z,>t)z, 

and the estimating function ^„(0,f/„, 1) becomes the rank-based estimating 
function studied by [26] and [29], among others. [26] and [30] proved asymp- 
totic linearity of ^'„(0,?}„,1) and thus normality of the estimator obtained 
from ^niff^Vn, 1) = using a stochastic integral formulation and martingale 
theory for counting processes. 

(ii) When pn takes the form in (2.3) and Qi = Wi = l for all i, ^'n(^, Vn, Pn) 
becomes the estimating function of [26] with Gehan weights. The monotonic- 
ity of such an estimating function was studied by [7] . 

(iii) When /5„ takes the form in (2.3) and Qi = l,Wi = l{i G 5C)/Pr(i G 
SC) for all i where SC denotes the set of labels of the subcohort in a case- 
cohort study, ^n{d-,'f]n-,Pn) becomes the estimating function of [16] with gen- 
eralized Gehan-type weights. 

(iv) When pn takes the form in (2.3) and Vti = 1, Wi = Ri/iii for all i, 
where vrj depends on Aj, ^niG,fimPn) becomes an extension of the estimat- 
ing function of [31] (where the authors focused on numerical aspects and 
did not provide asymptotic properties). The weights Jlj = 1 and Wi = Ri/ni 
have been applied to case-cohort studies to potentially improve efficiency in 
the Estimator II of [2] as well as in [5, 13] for the Cox model. 

(v) When Qi = Wi = Ri/iTi, the estimating function ^'n(^, ''?n, Pn) can be 
applied to a general missing data problem with covariate Zu missing at 
random. This arises, for example, in a two-phase sampling design and yields 
an estimator that is similar to that proposed in [20] and further studied by 
[3] for the Cox model. 

In examples (i), (ii) and (iii), the estimating functions can be formulated 
as martingales, and the related theory applies. In the last two situations, 
however, weights iii and/or Wi depend on Aj, particularly in case-cohort 
studies, and, thus, are not predictable. There is no martingale representa- 
tion of these weighted estimating functions. Further complications are: (1) 
the estimating function ^'„(0, ?}„, /5„) is a nonsmooth function of 9, so that 
the methods developed for smooth estimating functions based on Taylor ex- 
pansions do not apply; and (2) the nuisance parameters r] and p are explicit 
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functions of 9, whereas usual semiparametric models assume that nuisance 
parameters do not vary with the parameter of interest. 

Our simulation study shows a substantial efficiency gain when such outcome- 
dependent weights are used and more efficiency gain when the known weights 
are estimated from observed data. This latter result has been often noted 
(see, e.g., [3, 11, 18, 22], among many others). For these reasons, it is de- 
sirable to rigorously investigate the theoretical properties of the estimators 
obtained from the estimating function ?7„, with both known and 

estimated flexible weights. 

3. Asymptotic properties. Assume that the observed data are i.i.d. In 

addition to Conditions 1-3 in [30] (also assumed in [26]), we assume Condi- 
tions (A) and (B) below and derive asymptotic properties of the estimator 
obtained from the weighted estimating function "^nidjVn, Pn)- In particu- 
lar, these results apply when fjn is given by (2.2) and pn takes the form 
(2.3), which estimates po = Pr(y — 6'Z > t) with either true weights Wi or 
their estimates Wi. Our method does not depend on stochastic integrals 
and, hence, does not require predictability of the weights. So, it applies to a 
much broader range of estimating functions. Note that fin{t,6) in (2.2) and 
Pn{t,6) in (2.3) are not differentiable in 6. 

Condition (A). There exist constants r < oo and ^, such that Pr(Y — 
0'Z>t)>C>0 for all Z and 6* G O. 

Condition (B). The selection probability vr = Pt{R = 1\Z2,Y, A) > ( > 
for all Z2, Y and A for some constant C- 

Condition (A) follows an assumption in equation (3.1) of [26]. Condition 
(B) is a common assumption in the missing data literature and guarantees 
that the inverse selection probability weights are bounded. Using empirical 
process theory, we follow the idea of [26] and [30] to show the asymptotic 
linearity of '^n{0,'nn, Pn) in 6* in a neighborhood of the true value Oq. We 
adopt the empirical process notation of [27]. In particular, for a function / 
of a random variable U that follows distribution P, we deflne 

Pf = l fiu)dP{u), 

n 

1=1 

Gnf = n-^l\¥n-P)f 

and refer all the details to the reference. Throughout the article, we assume 
that Vti and Wi are bounded and satisfy E{Vti\Xi) = E{Wi\Xi) = 1, for all 
i, and set £o = Y — 6' Z and eo = Y — O'qZ. 
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3.1. Using true weights. Consistency and rate of convergence of the pro- 
posed estimator 9n for general t] and p are given in Theorems 3.1 and 3.2, 
respectively. Asymptotic normality of 9n obtained from the estimating func- 
tion "^nidjfln, Pn), with fjn and pn taking the forms in (2.2) and (2.3), is 
given in Theorem 3.3. Proofs are deferred to Section 6. 

Theorem 3.1. Denote ^{e,r],p) = P[p{ee,e){Z - r]{ee,e)}I^]. Let G, 
the parameter space of 9, he compact, assume that 6q ^ Q is the unique 
solution of^{9,r]Q,pQ) = and let \\-\\ be the supremum norm. // [j?? — ??o|| < 
6n and \\p — poll ^ with 5n J, 0, where r], r]Q, p and pQ belong to Glivenko- 
Cantelli classes and are hounded, then: 

(i) In outer probability, 

(3.1) \\^n{0,r],p)-^i9,r]o,po)\\^0; 

(ii) An approximate root 9n satisfying '^n{9n,'n{'j^n), p{-,9n)) = Op*(l) is 
consistent; 

(iii) When fjn and pn are given respectively by (2.2) and (2.3), an ap- 
proximate root 9n satisfying ^n{dn,'f)n{--,(^n)-,Pn{--,(^n)) =Op*(l) is consistent. 

Theorem 3.2. Let ©o C he a neighborhood of 9q, \\ ■ \\q be the supre- 
mum norm in Oq and fjn be as in (2.2). Assume that \\pn — Po\\o = Op* (n~^/^), 
and assume that both pn and pQ are bounded and belong to a Donsker 
class. Let 9n he an approximate root satisfying '^n{9n,'nn{-,9n), Pn{-,0n)) = 
Op*(?i~^/^). Suppose ^ {9 , r]o{- , 9) , po{- , 9)) is differentiable with bounded con- 
tinuous derivative "^ei^, r]Q{-,9), po{-,9)) in Qq, and ^e{0o, %(•, ^0)5 Po(') ^o)) 
is nonsingular. Then, \\rin — VoWo = Op* {n~^/'^) and \9n — 9q\ = Op*{n~^^'^). 
Finally, if Pn takes the form in (2.3) and po{t, 9) = Pr(e5i > t), then the above 
conditions for pn and po are satisfied. 

In the proofs of the above theorems, given in Section 6, we apply the per- 
manence of the Donsker property under closures and convex hulls (see [27]) 
to show that (2.2) and (2.3) and their limits are Donsker. A variety of suf- 
ficient conditions for Donsker classes of functions are provided in [27] . 

When fjn takes the form in (2.2), the estimating function ^n{d,fln, Pn) is 
discontinuous in 9. In the case of full cohort data with = Wi = 1 for all 
i, [21, 26, 30] showed, with considerable effort, the asymptotic linearity of 
^n(^,?/n,l), in a neighborhood of the true parameter ^o, in order to prove 
asymptotic normality. [16] had equally complicated arguments for asymp- 
totic linearity in case-cohort studies where the weights Wi do not depend 
on Aj. We avoid the stochastic integral formulation and apply empirical 
process theory to show the asymptotic linearity of ^'.„(0, f)„(., ^), /5„(-, 6*)) 
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around 9q for the class of missing data problems considered here. In Theo- 
rem 3.3, we focus on the situation where f/n and pn are, respectively, given 
by (2.2) and (2.3). For other types of bounded weight functions pn and po, 
proofs of asymptotic normality follow the same steps, and the same asymp- 
totic representation should hold if {pn} and {po} are Donsker and pn is an 
asymptotic linear estimator. This approach takes care of both predictable 
and nonpredictable weights. 

Theorem 3.3. Let fin and pn be as in (2.2) and (2.3). Let On he an ap- 
proximate root satisfying ^'„(^.„,f)„(-,^„),/5„(-,^„)) = Op*(n~^/^). Let y and 
Z denote the sample spaces of random variables Y and Z, respectively. Sup- 
pose that po{£g,9) and r]Q{£g,0) are differentiable in 9 with derivatives pqq 
and r]oe, which are uniformly bounded and continuous in Qo x y x Z . Note 
that this implies that '^{9,r]Q{-,9), pq{-,6)) is differentiable in with bounded 
continuous derivative ^0{9,rjQ{-,9),pQ{-,9)) in Oq. Then, we have the fol- 
lowing: 

(i) The asymptotic linearity 

n^^^'i>n{9n,fjni;9n),Pni;9n)) 

(3.2) =n'/^^n{9o,fin{;9o),Pn{;9o)) 

+ n^/\9n-9o)^e{9o,Vo{;9o),Po{;9o)) + Op*{l) 

holds; 

(ii) Lf^g{0Q,riQ{-,9o), po{-,9o)) is nonsingular, then n^/"^ {9 n — 9) is asymp- 
totically normal with the asymptotic representation 

n^'\9n - 9o) = {-^e{9oM;(^o),Po{;9o))}'^ 



(3.3) 



npoieo,9o){Z-r]o{£o,9o)}A 

Wpo{t, 9o){Z - 7jo{t, 9o)}l{eo > t) dAo(t) 



+ or(l). 



Remark. As becomes clear in the proof of Theorem 3.3, the asymptotic 
representation (3.3) is the same if the weight function po{t, 9) is known, and, 
in fact, such a property does not depend on what po{t,9) is. This finding is 
consistent with the claim in Section 4 of [26]. Equation (3.3) reduces to the 
result of [16] for predictable W when 0, = 1 and pQ{t,9) = 1. The variance 
estimator for 9n can be obtained following the method described in [16] 
based on the asymptotic representation (3.3) and the original idea of [9]. 
Alternative variance estimation methods can be found in [10, 17]. Later, in 
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Section 4.1, we show that letting Q = W yields more efficient estimation for 
the example of a case-cohort study. 

3.2. Using estimated weights. In Theorems 3.1, 3.2 and 3.3, the subject- 
specific weights Wi and are assumed to be known. This is a reasonable 
assumption for many types of sampling designs when weights are the in- 
verse of sampling probabilities, because sampling probabilities are usually 
prespecified by investigators. In the missing data literature, many authors 
(e.g., [22] and [3]) have pointed out that using the estimated weights im- 
proves the asymptotic efficiency, even though the true weights are known. 
Suppose true weights Wi are parameterized by a with true value ao; that 
is, 

W^ = W{Xi;ao), i = l,...,n. 

Let Q!„ be an estimator of a. Then, we can estimate Wi by 

W = WiXf,an), i = l,...,n. 

In this subsection, we take fij = Wi, i = 1, . . . ,n, for simplicity, and we con- 
sider the asymptotic properties of the estimator 0* , which are obtained from 
the following semiparametric estimating function with estimated weights: 

1 " - 

(3.4) K{diV*n,Pn) = -J2W^P*n{Y^ - d' Z^Z, - fjl{Y, - d' Z,,d))^i, 



where 

(3.5) C(i,^) = 
and 

(3.6) pi{t,e) 



j:]^,Wji{Yj-e'z,>t)z, 
j:]=iW,i{Y,-e'z,>t) 

j:Uw,i{Y,-e'z,>t) 



This case fij = Wi handles the case-cohort study, naturally, when inverse 
sampling probability weights are used for which = Wi = 1 whenever Aj = 
1. Note that the estimating function (3.4) is obtained by replacing known 
weights Wi with their estimates Wi in ^n{6-,'f]n, Pn)-, "fin and p„; see (2.2) 
and (2.3). As in Theorem 3.3, the following result holds for other types of 
bounded weight function po and estimator /5* , provided that {/5*} and {po} 
are Donsker, and that function of a, is an asymptotically linear 

estimator that is twice continuously differentiable in a with the first-order 
derivative converging to an integrable limit at ao • The latter remark becomes 
clear in the proof of the next theorem. 
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We now consider consistency and asymptotic normality of 0* in Theo- 
rem 3.4 with a reasonable assumption about otn and a classical smoothness 
condition for T4^(X; a) in a. The efficiency gain from using estimated weights 
becomes evident. 

Theorem 3.4. Suppose that W{X;a) is twice differentiahle, with re- 
spect to a, in Aq x X with continuous and hounded derivatives, where Aq 
is a neighborhood of o,nd X is the bounded sample space of the ran- 
dom variable X. Suppose that ctn is an asymptotically efficient estimator 
of a with bounded influence function at uq. Let 57* and p* be defined by 
(3.5) and (3.6), and let 0* he an approximate root satisfying the equation 
^*(^*, ?7*(-, 6**), /)*(•, ^*)) = Op*(n~^/^). Suppose that all the assumptions in 
Theorem 3.3 hold. Then, 6* is consistent, andn^^'^{6* —9Q) is asymptotically 
normal with zero mean and the asymptotic variance 

(3.7) So - {^e{eo,vo,Po)}-^BVoB'{^e{Oo,vo,Po)}-\ 

where So is the asymptotic variance of n^^^{9n — Oq) determined by (3.3), 
Vq is the asymptotic variance 0/ n^/^(Q„ — Qo), and 

B = P[po{eo, eo)A2{eo,eo)A] - P[po{eo,do){Z - 7?o(eo, eo)}{Wa{X; ao))'A], 
with Wa{X;a) denoting the a-derivative ofW{X;a) and 

Mt,Oo) = — ^[P{l(eo > t)Z{Wa{X;ao)y} 

- r?o(t, eo)P{{Wa{X; ao))'l(eo > t)}]. 

Note that, if /)* = p„, = 1, then po(^)^o)) in the above expression for A2, 
should be replaced by P{l(eo > i)}- The asymptotic efficiency of a„ is one of 
three sufficient conditions for applying the result of [18] to obtain the above 
asymptotic normality of When data are missing at random and inverse 
sampling probability weights are considered, the parameter a is adaptive to 
other parameters (see [1]) and its efficient estimator can be easily obtained, 
for example, by the maximum likelihood method. In sampling designs, a 
stratified approach is commonly used to improve efficiency. If the number 
of strata is finite, then the (independent Bernoulli) sampling probabilities 
within strata consist of the parameter a, and the sampling fractions are the 
maximum likelihood estimates of a. 

The other two conditions of [18] are: (i) n^/'^{9n — Oq) and n^/^(Q;„ — ao) 
are asymptotically jointly normal; and (ii) n^/^(^* — ^0) is asymptotically 
equivalent to n^/^(^„ — 6*0) + -Bn^/^(d„ — ao). The former is determined by 
(3.3) in Theorem 3.3 and the fact that ctn is an asymptotically linear esti- 
mator. The latter is established with a detailed proof in Section 6. 
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Consider a stratified case-cohort study. Suppose that all the censored 
subjects in a study cohort are divided into S strata by the variable Z2 G 
{Ci,---,C5}- In a stratified case-cohort study, all of the failures are com- 
pletely observed. For censored subjects, we denote the true sampling prob- 
abilities by aos, I < s < S. Suppose that there are n<j subjects in stratum s, 
out of whom n* are selected into the subcohort by the independent Bernoulli 
sampling. We assume that, when n— >oo, Us/n ^ 'js > 0, I < s < S . In- 
stead of using the true sampling probabilities ao = {aoi, . . . ,003)' in the 
weight function W, we now replace each aos with the sampling fraction 
0(n,s = n*/ns-, 1 < s < S. We can then denote the sampling probability and 
its estimator of the zth subject as 

5 S 
■Ki = ^l{Z2i = C,s)aQs and vrj = ^ l(Z2i = G)an,s- 

s=l s=l 

We consider the inverse sampling probability weights 



W{Xi■,an)=^i + {l-^^ 



The second term in the expression for matrix B in Theorem 3.4 becomes 
zero, since Wa contains the factor (1 — A). The asymptotic variance of 
is Vo = diag{aoi(l — Q;oi)/7i, . . . ,aos(l — "os)/7s}) which can be easily es- 
timated from observed data. 



4. Numerical results. 



4.1. Asymptotic efficiency comparison. Considering the standard nor- 
mal, standard logistic and standard extreme value error distributions in 
model (1.1), we evaluate asymptotic efficiency under cohort setting 

to illustrate different extents of efficiency gain by using different weights. 
The one-dimensional covariate Z is taken to follow a Bernoulli distribu- 
tion with success probability 0.3 and = 0. Censoring time has a uniform 
distribution on [a, 5] , where a and b are chosen to obtain 80% censoring pro- 
portion. Let Z* be a binary correlate of Z with Pr(Z* = 1\Z = 1) = 0.8 and 
Pr(Z* = 0\Z = 0) = 0.8. The subcohort is a stratified subsample selected by 
independent Bernoulli sampling with selection probability tt(Z*), chosen so 
that the two strata determined by Z* have the same expected number of 
subjects. 

For each error distribution, we consider a 2^ factorial design with the 
following factors: 

• logrank weights {pn = 1) and Gehan weights [see (2.3)]; 

• subject specific weight: predictable with Wi = l{i G SC)/TTi and nonpre- 
dictable with Wi = Ai + {1 - Ai)l{i e SC)/Tri; 
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• subject specific weights: true Wi = W{Xi; oq) and estimated Wi = W{Xi; an)- 

The asymptotic variance of logrank weighted method for the fuU cohort is 
used as the benchmark, and we report the relative efficiency for each of the 
8 scenarios with subcohort size fraction ranging from 1% to 100%. Results 
are given in Figures 1-3, where: (1) dark curves represent logrank weights, 
and gray curves represent Gehan weights; (2) solid curves represent pre- 
dictable known weights, and dotted curves represent predictable estimated 
weights; and (3) dashed curves represent nonpredictable known weights, and 
dotted/dashed curves represent nonpredictable estimated weights. 

We can see that using estimated weights VF(Xj;a„) does not improve 
efficiency very much compared to using true weights W{Xi;ao) for the set- 
tings considered. The efficiency gain from using the nonpredictable weights 
is substantial, especially for small to moderate sampling rates. An interest- 
ing feature is that when the subcohort size is relatively small, the Gehan 
weighted method performs much better than the logrank weighted method 
for all three error distributions, even though the result is opposite when 
subcohort size is close to the full cohort for both logistic and extreme value 
error distributions. We do not have an analytical explanation for this phe- 
nomenon, which seems to persist in other simulations as well. It seems safe, 
however, to recommend the Gehan weights for the problems with missing 
data; it is fortuitous that the Gehan weights also yield a monotone esti- 
mating function, which is a numerically advantageous property. Another 
interesting phenomenon is that, for the logistic error, the Gehan weights 
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Fig. 1. Asymptotic efficiency under normal error distribution. 
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Fig. 2. Asymptotic efficiency under logistic error distribution. 



may be somewhat less efficient than the logrank weights for censored data, 
even though they are the most efficient for uncensored data (see [12]). 

4.2. Simulations. We conduct simulations under the same settings as 
that in the previous subsection. Since the simulation results are basically 
telling the same story for different error distributions, we only report the re- 
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Fig. 3. Asymptotic efficiency under extreme value error distribution. 
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suits for the logistic error. We consider case-cohort designs with cohort size of 
2000 and subcohort sizes of 15%, 20% and 25% of the entire cohort on aver- 
age, which lead to on average 640, 720 and 800 completely observed subjects, 
respectively. Bias of the point estimator, average of the variance estimator, 
empirical variance and 95% coverage probability, based on the variance es- 
timator, are reported for five different analyses using the following logrank 
and Gehan weights: full data analysis, predictable subject-specific weighted 
analysis using true weights, predictable subject-specific weighted analysis 
using estimated weights, nonpredictable subject-specific weighted analysis 
using true weights and nonpredictable subject-specific weighted analysis us- 
ing estimated weights. The asymptotic variance for each scenario is also 
reported. From Table 1, we see that all of the methods work well for finite 
samples and reflect the patterns observed from the efficiency results in the 
previous subsection. 

5. Discussion. We consider only the case where weights Hi and Wi are 
i.i.d. for alH = 1, . . . , n, which makes the proofs of the asymptotic properties 
more straightforward. For the case where the weights are determined by 
(stratified) simple random sampling, the method of [3] may be applicable, 
and this is an interesting topic worthy of further investigation. 

6. Proofs. 

6.1. Proof of Theorem 3.1. As in [26], for notational simplicity, we as- 
sume one-dimensional 6 in the proofs of the theorems in Section 3. 

Since r/, ?7o, p and po belong to Glivenko-Cantelli classes, it follows, 
from Theorem 3 of [28], that the set of bounded functions {Q.piY.,0){Z — 
r/(ee, ^)}A} is a Glivenko-Cantelli class. By adding and subtracting the same 
term, and by the triangle inequality, we then have that 

\\^n{0,r],p)-^{e,i]Q,pQ)\\ 

= \\Wnpp{ee,e){Z - 7?(e,, 0)}A] - P[ilp^{e0,e){Z - 7?o(ee, 0)}A] || 

< ||(P„ - P)[np{ee,e){Z - l^{ee,e)}^]\\ 

+ \\P{^{p - po)^A}|| + ||P{0(p7? - po%)A}||. 

The first term on the right-hand side of the above inequality converges to 
zero in outer probability by the Glivenko-Cantelli property. Obviously, 

\\pmp - Po)^A}|| < Hp - p4p\vlz^\ ^ 

and 

||P{n(pr?-por?o)A}|| 

< WpV - Po%||^'|^^A| 0, 



14 



B. NAN, J. D. KALBFLEISCH AND M. YU 



Table 1 

Summary statistics of simulations, where a = subcohort size fraction; Method 1 = full 
data analysis, 2 = predictable subject- specific weighted analysis using true weights, 3 = 
predictable subject-specific weighted analysis using estimated weights, 4 ~ nonpredictable 
subject- specific weighted analysis using true weights, 5 — nonpredictable subject-specific 
weighted analysis using estimated weights; Emp. Var = empirical variance estimator; 
Ave. Var = average of variance estimator; CP — coverage probability; Asym. Var = 

asymptotic variance 



a 


Weight 


Method 




Emp. Var 


Ave. Var 


95% CP 


Asym. Var 


0.15 


Logrank 


1 


-0.001 


0.018 


0.019 


95.6 


0.018 






2 


0.019 


0.074 


0.075 


93.2 


0.073 






3 


0.018 


0.066 


0.072 


94.4 


0.069 






4 


0.015 


0.056 


0.059 


95.4 


0.059 






5 


0.015 


0.052 


0.058 


95.8 


0.056 




Gehan 


1 


0.006 


0.020 


0.020 


96.6 


0.020 






2 


0.018 


0.047 


0.047 


94.0 


0.047 






3 


0.016 


0.040 


0.042 


95.4 


0.044 






4 


0.015 


0.038 


0.039 


96.4 


0.039 






5 


0.014 


0.034 


0.036 


96.2 


0.037 


0.20 


Logrank 


1 


-0.001 


0.018 


0.019 


95.6 


0.018 






2 


0.007 


0.060 


0.059 


94.0 


0.056 






3 


0.008 


0.055 


0.057 


94.8 


0.054 






4 


0.006 


0.049 


0.048 


93.0 


0.046 






5 


0.007 


0.046 


0.046 


94.6 


0.045 




Gehan 


1 


0.006 


0.020 


0.020 


96.6 


0.020 






2 


0.011 


0.039 


0.039 


96.0 


0.039 






3 


0.012 


0.035 


0.035 


95.6 


0.037 






4 


0.011 


0.034 


0.033 


95.2 


0.033 






5 


0.011 


0.031 


0.031 


95.8 


0.032 


0.25 


Logrank 


1 


-0.001 


0.018 


0.019 


95.6 


0.018 






2 


0.003 


0.048 


0.049 


94.0 


0.047 






3 


0.004 


0.043 


0.047 


95.8 


0.045 






4 


0.002 


0.040 


0.041 


94.4 


0.039 






5 


0.003 


0.038 


0.040 


94.8 


0.038 




Gehan 


1 


0.006 


0.020 


0.020 


96.6 


0.020 






2 


0.007 


0.034 


0.034 


95.6 


0.034 






3 


0.008 


0.031 


0.032 


95.4 


0.033 






4 


0.008 


0.030 


0.030 


95.0 


0.030 






5 


0.008 


0.028 


0.029 


94.8 


0.029 



where 

\\pV-poVo\\ = ill(p-Po)(?/ + %) + (p + Po)(?/-??o)|| 
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^0. 

This establishes (3.1), which, in turn, can be shown to imply \0n — 0o\ ^ 
in outer probability, as in [8]. For completeness, we include the argument 
here. 

Since is the unique solution to ^'(0, r/o(-, 6*), po('i ^)) =0, for any fixed 
e > 0, there exists a 5 > such that 

P[\9n - ^ol > e] < P[\^{en,mi;On),Poi;0n))\ > 6]. 

We show that |^'(^n5 %(•) ^n), /'o(') ^n))| ^ in outer probability, and the 
consistency of 9^ follows immediately. Note that there exists a sequence 
{^n} i such that ||?? — ??o|| ^ and ||p — poll < (^n with probability tending 
to one. Hence, from (3.1), we have the inequalities 

\^{en,vo{;0n)),po{;0n))\ 

< \^n{0n,V{;On),p{;On))\ 

+ \^{enM-Jn),Po{;en))-^n{en,r,{;en),p{;en))\ 

< \^n{0n,V{-Jn),p{;0n))\+Op,{l) 
= Op-(l). 

Hence, On is consistent. 

We now show that (3.1) holds, when rj and p are replaced by fjn and pn 
given in (2.2) and (2.3), respectively, and po{t,6) = Pr(e0 > t). We define 

{t, 9) = ¥n{Wl{ee > t)}, (t, 9) = P{Wl{ee > t)}; 

DW(t,9)=Fn{Wl{ee > t)Z}, d«(t,0) = P{Wl{ee > t)Z}. 

Thus, fin{t,9) = D^n\t,9)/Di^\t,9) and r]o{t,9) = d^\t,9)/S^\t,9). The 
latter equality holds because 

P{Wl{ee>t)} = P{l{ee>t)} and P{Wl{ee >t)Z} = P{l{ee >t)Z}. 

Since the class of functions {l{ee > t)} is a VC-class (see, e.g.. Exercise 9 on 
page 151 and Exercise 14 on page 152 in [27]) and, thus, a Donsker class, we 
know that the sets of functions JFq = {VFl(e0 > t)} and !Fi = {W1{£0 > t)Z} 
are Donsker classes (see, e.g., [27], Section 2.10). Since Donsker classes are 

Glivenko-Cantelli classes, it follows that \\Di^^ {t, 9) - d^^) (t, 6*) || ^ in outer 
probability, A; = 0, 1. Let r correspond to T* in [26] and represent the longest 

follow-up time. Since both Dn^ (with probability 1) and are bounded 
away from zero when t <t, we have 

(6.1) \\fin{t,9)-r]o{t,9)\\^0 
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in outer probability. Similarly, we have 

(6.2) \\p^{t^e)-pQ{t,e)\^Q 

in outer probability. 

Let be the closure of ^fc, A: = 0, 1, respectively, in which the convergence 
is both pointwise and in L2{P). Then, Dn\t,0) and S^\t,9) are in the 
convex hull of .Ffc, /c = 0, 1, and, thus, belong to Donsker classes (see, e.g., 
[27], Theorems 2.10.2 and 2.10.3). Hence, both {??„(t,6l)} and {r]o{t,e)} are 
Donsker (by [27], Example 2.10.9) and, thus, Glivenko-Cantelli. Similarly, 
we can argue that both {pn{t,6)} and {po(t,9)} are Donsker and, hence, 
Glivenko-Cantelli. Then, by the first half of the proof we obtain 

\\^ni0,f]n,pn)-^{9,VO,Po)\\^O 

in outer probability. 

6.2. Proof of Theorem 3.2. From the proof of Theorem 3.1 we see that 
n^^'^{Dn\t,9) — S^\t,9)}, k = 0,1, converge to zero mean Gaussian pro- 
cesses for ah ^eeo, and \\n^/^{D'h'\t,e) - d^''\t,e)}\\o = Op*{l), A; = 0,1, 
by the tail bounds for the supremum of empirical processes in [27], Section 
2.14. We then have 

n'/HUt,0)-Voit,0)} 
1 



n 



1/2 



dn\t,9) 



{D^^\t,e)-d^^\t,e)} 

{D^^\t,e)-d^'\t,e)} 



Dl^\t,e)d(o){t,e) 



n 



1/2 



d(^){t,e) 
d«(t,e) 



{DW(t,9)-S'\t,9)} 

{D(^\t,e)-d(^\t,9)} 



+ or(l) 



d(o)(i,6')2 

= it, QY^v}!'' [{Z)(i) (i, 9) - Df^ (t, e^oit, 9)} 

- {dW(t,9) -d^''\t,e)r,o{t,9)}]+Op*{l) 

= {t, e)~'Gn[Wl{ee >t){Z- rio{t, 9)}] + Op* (1). 

Since the classes of functions {W^}, {l(e6» ^ ^)}; {^} {rjo} are Donsker, 
we know that {Wl{e0 > t){Z -r]o{t,9)}} is Donsker (e.g., [27], Section 2.10). 
Thus, n^^'^Wfjn — VoWo = Op*(l), since d^°\t,9)~^ is bounded. 
We now show n^^'^\9n — 9\ = Op*{l). First, we have 

(6.3) \\n^/^^n{0,U;0),Pn{;e))-^{9,m{;0),Po{;e))}\\o = Op*{l) 
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by applying the triangle inequality, and that {fjn} and {/5„} are Donsker, as 
well as n^^'^Wpn — Po\\o = Op* (1) and n^/^Hr/n — r/o||o = Op* (1) in the following 
calculation: 

\\n'/^{^n{e,U;0),Pn{;0))-^{9,r]o{;e),po{;d))}\\o 
= \\n^/\Fn - Pmpn{ee,e){Z - f,n{ee,e)}A] 
+ n^/^P[n{pn{ee,9) - po{ee,9)}ZA] 
+ n^/^P[npn{ee,9)fin{ee,9) - po{ee, 9)ijoiee,9)A] ||o 
< \\Gn[npn{ee,9){Z - f,n{ee,9)}A]\\o + n^^^Wpn - Polio • PinZA) 
+ i(n^/^||p„ - Polio • \\'nn + VoWo 

+ \\Pn + P0\\0-n^^^\\fln-V0\\0)P{^^) 

= Op*{l). 

Because ^(^Oi %(■) ^o)i Po(') ^o)) = and \9n — 9q\ = Op*{l) by Theorem 3.1, 
we then have 

Op*{l) = -n^/^{^n{k,f,n{-X).Pn{-X)) - '^{OnM-X),Po{-X))} 

= Op.(l)+ni/2^'(^„,ryo(-,^„),po(-,4)) 

(6.4) -ni/2vI/(0o,%(-,^o),Po(-,^o)) 

= Op*{l)+n^'\9n-9^)i>e{9\m{:e*).P^{;0*)) 

= Op*(l)+ni/2(4-0o){^^?(eo,r/o(-,^o),Po(-,^o)) + Op.(l)}, 

where 9* is a point between 9q and 9n- Thus, n^^'^{9n — Oq) = Op*{l). 

Let Cn = n~^J2i=iWi. By the central limit theorem, n^/^(C„ - 1) = 
Op{l). Thus, when pn takes the form, in (2.3) and pQ{t,9) = Pv{eg > t), 
they are clearly bounded, and we can show n^^'^\\pn — Po\\o = Op*{l) by the 
following calculation: 

n'/^Pn{t,0)-po{t,9)} 

= n'/'\{D(^\t,9)-d('^\t,9)} - ^^^^{C„ - 1}" 

= n'/^[{Dl^\t,0) - d(°)(t,e)} - d'Ht,e){Cn - l}]+Op*{l) 
= nV2[{D(0)(t,e) _ C7„d(0)(t,^)}] +Op*(l) 

= Gn[W{l{ee >t)- (t, 0)}] + Op* (1). 

We have already shown in the proof of Theorem 3.1 that such chosen pn and 
po belong to a Donsker class. 
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6.3. Proof of Theorem 3.3. The differentiability of both pQ{eg,0) and 
r]Q{eg,9) in 6 and its impUcation of the differentiabihty of ^{9, tiq{-,0), pq{-,9)) 
in 0, as well as the continuity and boundedness of the derivatives, can 
be shown by interchanging integration and differentiation, which is war- 
ranted by the dominated convergence theorem under the given regularity 
conditions. From Theorem 3.2, we know that \9n — 9q 
1^ — ^o| ^ Kn~^/'^ with K < oo. Then, we have 



(6.5) 



(6.6) 



n^l^{^>n{9,U-,e),Pn{-,9))-^>n{9Q,fjn{-MrPn{-M)} 

= n^l''[^n^pn{ee,e){Z - r?„(e,, ^)}A 

- ¥n^pn{ee,e){Z - ?7„(eo, ^o)}A] 
\f]p„(ee,e){Z-?7„(eo,^o)}A 



+ n 



1/2 



- P„f]p„(eo, ^o){^ - ??n(eo,eo)}A]. 
We first look at term (6.5), which can be rewritten as 

n^'^[-¥n^Pn{ee,0)f^n{ee,9)^ + P„J]p„(e^,, 0)77„(eo, ^o)A] 

(6.7) =-Gn [npn {ee,9){f,n{eB,9)- f,n (eo ,9o)}A] 

(6.8) - n^/^P[Qpn{ee, 9){7]n{ee,9) - 7?„(eo, ^o)}A]. 

Term (6.7) converges to zero in outer probability, because fipnT/nA belongs 
to a Donsker class by arguments similar to those in the proof of Theorem 
3.1, and ^lpn{£9,9){fin{£e,9) — fin{£o,do)}A converges to zero in quadratic 
mean. Let t' = t — [9 — 9q)z. Direct calculation yields 



n 



^/^P[npniee,9){fin{£9,9) - r]o{eg,9)}A] 



■ n 



l/2p 



Pn{ee,9) 



A 



n 



Pn{t',9) 



do){t',9) 

D[l\t',9) 



(6.9) 



L>l°'(t',e)d(o)(i',0) 
X (5(i-P£o,A,z(i,5,^;) 



{D«(t',e)-d«(t',^)} 

{D^^\t',9)-S^\t\0)} 



n 



Pn{t',9) 



{DW(t',9)-d^'\t',9)} 
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yiddPsa,A,zit,S,z) + Op*{l) 

GnPn{t',e)d''Ht',0y^Wl{ee>t') 

x{Z- r]o{t', 9)}dPeo,A,z{t, 1, z) + Op. (1) 

G„p„(t', e)i{t', e, W, Z, ee) dPsoA,zit, Op* (1) 

where £{t' ,e,W, Z,eg) = d^^\t' ,e)-^Wl{eg > t'){Z - m{t' ,9)} and P^o.a.z 
denotes the joint probabihty law of [eQ,/S.,Z). Clearly, the class of func- 
tions {pn{t.,9)(.{t,9,W,Z^ee)} is Donsker. The above middle equality holds 
because 

nV2 j p^^it\ 6) [-^3^{Z)(i) (f, 9) - {t\ 9)} 



D^a\t',9) 



Dli'\t',9)di^){t',9) 
X SdPeo,A,z{t,6,z) 



{Dli^\t',9)-d^^\t',9)} 



- J p^(t\ 9) y^±^^{D^^) it', 9) - (f, 9)] 



X 5dPeQ,A,z{t,5,z] 



Pn{t',9) 



d^^\t',9) 



Di^\t',9) 



< 1 



dio){t',9y D^^\t',9)d(o){t',9) 
X ni/2{D(0) (f, 9) _ d(0) it', 9)}6 dP,,,A,z{t, ^, ^) 



• ||ni/2{Z)W(t,0)_d(o)(t,^)}|| .1 

= Op.(l)-Op.(l)-l = Op*(l) 

by the tail bounds for the supremum of empirical processes in [27], Section 
2.14. Similarly, we have 

n'/^P[npn{ee,9){Ueo,0o) - vo{eo, 9o)}A] 

= J GnPnit',9)iit,9o,W,Z,£o)dPe„AMi^^z)+Op*{l). 
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Thus, (6.8) becomes 

- n'/^P[prr{ee,e){r]o{ee,e) - 770(60, eo)}A] 

(6.10) 

X {({t', e, W, Z, ee) - e{t, 90, W, Z, sq)} dPe,,A,z{t, 1, z) 

+ Op.(l). 

Note that n^^^imi^e,^) - Vo{£o,0o)} = n^^'^id - ^o){mei£9* ,0*)} is bounded 
(by assumptions of bounded density functions for failure and censoring times 
in [30]), where rjog denotes the derivative of i]q with respect to 9, and 9* is a 
point between and 9. Thus, by repeatedly using the dominate convergence 
theorem, we know that the first term in (6.10) equals 

-n^/\9 - 9o)P{po{ee,9)me{eo,9o)A} + Op,{l), 
which in turn equals 



-n 



1/2/ 



9o)P{po{eo,9o)me{eo,9o)A} + Op, (1) 



It can be verified that Pnit', 9){i{t', 9, W, Z, eg) — £{t, 9o, W, Z, eq)} converges 
to zero in quadratic mean; thus, 

\\GnPn{t', 9){e{t', 9, W, Z, ee) - £{t, 9o, W, Z, eoM = Op, (1), 

then the second term in (6.10) converges to zero in outer probability. So we 
have shown that term (6.5) is asymptotically equivalent to 
P{po{eo,9o)m0{eo,9o)A}. 

We now consider term (6.6), which can be rewritten as 

n^/^Fn[n{Z - f]n{eo, 9o)}A{pn{ee,9) - pn{eo, 9o)}] 

(6.11) = GnMZ - finieo, 9o)}A{pniee,9) - pn{eo, 9o)}] 

(6.12) + n'/^P[n{Z - f]n{eo,9o)}A{pn{ee,9) - Pn{eo,9o)}]- 

Because Q{Z — fin{eo,9Q)}A{pn{eg,9) — /5„(eoi^o)} belongs to a Donsker 
class and converges to zero in quadratic mean, we know that term (6.11) 
converges to zero in outer probability. Similar to the calculation in (6.9), for 
(6.12), we have 

7i^/^P[n{Z - fjnieo, 9o)}A{pn{ee,9) - po{ee, 9)}] 
= Jl,- fini^t, 9o)} it', 9) - (t', g)} 

Dl^\t',9) 



Cn 



-{Cn - 1} 



dPeoA,z{t,l,z) 
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(6.13) =n'/^ J{z-f,n{t,6o)} 

X (^/^ ^) _ ^^^(0) (^/^ dP,„,A,z(t, 1, ^) + Op* (1) 

= y G,J{2-?}„(t,^o)} 

X W{l{ee>t')-d^'^\t',e)}]dP,,,A,z{tA,z) + Op*{l). 
Similarly, we have 

n^/^P[Q{Z - f,n{eo,eo)}A{pn{eo,eo) - po{eo,9o)}] 

= J Gn[{z-rin{t,eo)}W{l{eo>t)-d^'^\t,eo)}]dP,,,A,z{t^,z) 

+ Op*{l). 

Then, term (6.12) becomes 

n^/^P[n{Z - 7%{eo,eo)}A{po{ee, 9) - po(eo, ^o)}] 

+ 1 G„{z-r}„(t,eo)} 

xW[{l{ee>t')-d'>Ht',e)} 

- {l{eo >t)- d(o)(t, 9o)}] dPe,,A,z{t, 1, z) + Op. (1). 

Similar to the arguments following (6.10), we know that the first term above 
is asymptotically equivalent to n^/'^{9 — 6q)P[{Z — r]o{eo,OQ)}ApQg{eo,9o)], 
and the second term, above, is Op*(l). So, term (6.6) can be replaced by 
nV2(0 _ eo)P[{Z - rio{eo,9o)}Apoe{eo,9o)] + Op.(l). 

Then, from the above calculation for terms (6.5) and (6.6), we obtain 

n^/^^n{9,f,n{;9),pn{;9)) - ^n{9o,fln{;9o), Pn{;9o))} 

= -nV2(0 _ eo)P{po{eo,9o)7]oe{eo,9o)A} 

(6.14) 

+ n'/\9 - 9o)P[{Z - r]oieo,9o)}Apoe{eo,9o)] + o^, (1) 

= n^/\9-e^)i>e{9QM-M,P^{-M) + Op*(l), 

which yields the asymptotic linearity (3.2) when 9 is replaced by 9n- In fact, 
in the above expression, we have P[{Z — 7/0(^01 ^o)}^Po6»(£0) ^o)] = 0, given 
the equality r/o(^0;^o) = E{Z\eQ,A = 1), which can be verified directly (see, 
also, [21]). We keep it in the above calculation so as to clearly show the 
relationship of '^q and {'f]Qe-,Pw)- 

Since 9n satisfies n{9n,fln{- ,9n) , Pn{- ,9n)) = Op*{n~^/'^), showing asymp- 
totic normality for n^/'^{6n — 9q) is equivalent to showing asymptotic normal- 
ity for ii}/'^'^ n{9o,'r)n{- ,9q)^ pn{- ,9q)) . The following shows the calculation. By 
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adding, subtracting and rearranging terms, we have 

(6.15) -Gn[^Pn{eo,eo){f]n{eo,Oo)-vo{eo,eo)}A] 

(6.16) + Gn[n{Z - ijo{eo,9o)}A{pn{eo,9o) - Po{eo,Oo)}] 

(6.17) -n^/^P[npn{eo,eo){U^o,Oo)-Vo{eo,eo)}A] 

(6.18) + n^/^P[n{Z - r]o{eo, 0o)}A{p„(eo, ^o) - Po{eo,9o)}]. 

Repeatedly using similar arguments, we can show that terms (6.15) and 
(6.16) are Op*(l). Term (6.17) can be calculated similarly, as in (6.9), but 
with t = t', so that the lower case variable z is not involved in the integrand, 
and pn can be further replaced by pQ. Term (6.18) can be calculated similarly, 
as in (6.13). We then have 

n'/^^n{Oo,U;Oo),Pn{;eo)) 

= Gn \npo{eo,eo){Z - r]o{eo,9o)}A 

Po{t, eo)d(°^(t, eor^Wl{eo > t){Z - r/o(t, ^o)} c?^eo,A(t, 1) 

(6.19) + j{z-m(t,e^)}W{l{eo>t)-S-^\t,eo)}dP,,^^,z{t,l., 

+ Op*(l) 

= Gn J^/9o(eo,^o){^-r?o(eo,^o)}A 

(6.20) - //?o(t,^o)Wl(£o>t){^-??o(t,^o)}f^Ao(t) 



+ Op*(l), 

which converges in distribution to a normal random variable by the cen- 
tral limit theorem, because the influence function in the above expression 
is bounded. Here, Aq is the cumulative hazard function of cq = T — O'^Z. 
So, from equation (3.2), we know that v}/'^{On — Oq) is asymptotically nor- 
mal with asymptotic representation (3.3) if '^g{6o,r]o{-,9Q), pq{-,9q)) is non- 
singular. That the term (6.19), yielded by estimating the weight function 
po{t,6), is equal to zero can be verified directly, again, by using the equal- 
ity ryo(eO)^o) = EiZ\eo,A = 1). Term (6.20) is obtained from the following 
calculation: 

S^\t,9o) = P{W1{Y - 9'oZ > t)} 
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= p{i{Y-e',z>t)} 

= E[E{l{Y-e',Z>t)\Z}] 

= E[Vt{T - O'qZ > t\Z) Pr(C - O'qZ > t\Z)] 

= Jexp{-Ao{t)}{l-G{t\z)}dH{z), 

where G{-\z) is the conditional distribution function of the centered censor- 
ing time C — 9'qZ given Z = and H is the marginal distribution function 
of covariate Z. On the other hand, from the joint distribution of {eq^/S., Z), 
we obtain 



exp{-Ao{t)}{l-Git\z)}dHiz) 
d(o)(t,0o)rfAo(t). 



dAo{t) 



That term (6.19) is zero becomes even more straightforward from term 
(6.18) if the weight function pq is given and, thus, need not be estimated 
(e.g., pn = po = l). 

6.4. Proof of Theorem 3.4- We will sequentially show consistency, root- 
n rate convergence and the asymptotic normality of 0*. It is easy to see 
that {M^(x;a) :a G ^o} is Lipschitz in a and, hence, Donsker (see Example 
3.2.12 of [27]), so we have that {fj*} and {/5*} are Donsker (see Section 2.10 
of [27]). Based on the smoothness of W{X; a) in a and the structures of fjn, 
Pn, fin and yO* given in (2.2), (2.3), (3.5) and (3.6), we have 

\\W{X-an)-W{X-a)\\^Q, IIC-'?n||^0 and W^-puW^^ 

in outer probability by the mean value theorem and boundedness of the 
corresponding derivatives, with respect to a. The above three quantities are 
actually Op*{n~^/'^) by the root-n consistency of a„ and the smoothness 
assumption of W{X;a). Thus, with replaced by Wi in we have 



\\^l{e,iU-,e)rpl{-,e))-^n{e,fji{.,e),p*n{-,^ 

(6.21) < \\W{X- an) - WiX; a)\\ \\pUee,0){Z - e)}A|| 

= Op-(l) 

by the boundedness of pl{ee,e){Z - 7]:^{£g,9)}A. By (6.1), (6.2) and the 
triangle inequality, we have 

and \\Pn-po\\^0 
in outer probability, which by Theorem 3.1 imply that 

\\^ni0,fi:i;e),pU;e))-^ni9M;O),po{;e))\\ = op,{i), 
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since Donsker implies Glivenko-Cantelli. Hence, by the triangle inequality 
we have 

ii*:(0,?}:(.,0),p:(-,e))-^(0,^o(-,^),Po(-,^))ii=op.(i), 

which yields the consistency of 0* by the same argument as in the proof of 
Theorem 3.1. 

From (6.21), we know that 

\\n^/\^l{e,f,i{-,e)rpl{-m-'^n{0,vl{-.e)rpl{-mh = oA^)- 

Replacing {fjn,Pn) with (r/J^,/)*) in (6.3), we obtain 

\\n^'\^>n{e,vl{-,e)rpl{-M--^{OM-.o),p^{-,e))}h = oA'^)- 

Hence, by applying the triangle inequality, we have 

||ni/2{^;(e,r}:(.,0),p:(,0))-M/(0,r?o(-,0),Po(-,^))}||o = Op-(l), 

and the same calculation as in (6.4), with replaced by ^* and replaced 
by §1, shows that n^/^{ei - Oq) = Op- (1). 

We now prove the asymptotic normality of v}^'^{6^ — 6q). Consider the 
following decomposition: 

ny'K0*n,v:i-J:),p:i;K)) 

(6.22) =nV2^;(c,c(^^^:),p:(^^':))-ni/'^n(^:,r?n(-,0':),p„(-,c)) 

(6.23) +n^/^^n{ei,U-,Ol),Pn{-J*n))-n^^^^n{0o,U-,Oo),M-,(^o^ 

(6.24) +ni/2^„(0o,r/n(^^o),Pn(^^o))-ni/'*n(4,r?n(-,0"n),/5„(-,0„)) 

(6.25) +n^/^^niOn,fln{-,On),Pni-,dn)). 

Then, applying (6.14) to (6.23) and (6.24), respectively, we can replace (6.23) 
with 

(6.26) n^/2(C- ^0)^-9(00, ?yo(-,^o),/0o(-,^o)) + Op* (1) 
and replace (6.24) with 

(6.27) -n^/\en-eo)^e{eo,Vo{;0o),Po{;0o)) + Op*{l). 
Term (6.25), clearly, is Op*(l). We then calculate term (6.22). Let 

f,n,a{t, 0) = ^n{W{X- a)l{ee > t)Z}/Fn{W{X; a)l{ee > t)}, 
PnAt,0)=^n{WiX;a)l{ee>t)}/Fn{WiX;a)}. 
Then, we have f}„ = f}n,ao , Pn = Pn,ao , Vn = VnAn > and p* = . Let 
^n{a, 6) = Fn[W{X- a)pn,ai^e,e){Z - f,n,^{ee,9)}A]. 
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It can be seen by direct ca-lcula-tion that tlie second, derivative of $^(q;,^) 
to a is bounded with outer probabihty 1. So, by the Taylor expansion, we 
have 



where 

in,a(aO,^)=F„ 



{e0,6){Z - ??n,Qo(^9'^)} 



dW{X;a) 



+ W{X;ao){Z-fjn,ao{ee,0)} 



do: a=ao 



A 



+ W{X;aQ)pn,ao{£e,0) 



da' 
da' 



a=a.Q 



A 



It is also easy to see, by direct calculation, that {dfin,a/da\a=ao '-(^ £ ©o} 
and {dpn,a/da\a=aQ-G ^ Qq} are (componentwise) Glivenko-Cantelli, so, 
with outer probability 1, we have 

^uA^oJ:) ^ P[po{eo,eo){Z -vo(.eo,eo)}{Wa{X-ao)yA] 

+ P[W{X- ao){Z - r,o{eo,eo)}Ai{eo, ^o)A] 

- P[W{X- ao)po(.eo,eo)A2{eo,do)A] 

= P[po{eo,eo){Z - r?o(eo, ^o)}(^«(X; ao))'A] 
-P[po{eo,9o)A2{eo,9o)A] 



(6.28) 



where Wa{X;a) = dW{X;a)/da, A\ is the limit of dpn,a/da'\ 



A2 is the limit of dfjn^a/doi' 



and 



The term (6.28) is zero since E{Z\eo, 



A = 1) = ?7o(eo5 ^o)- Note that E{W\X) = 1 is also used in the above calcu- 
lation. It can be directly verified that 



A2{t,9o) 



1 



-[P{lieo>t)Z{WaiX;ao)y} 



P{{l{eo>t)}' 

- vo{t, 9o)P{{Wa{X; ao))'l(eo > t)}]. 

Hence, we have 

(6.29) K{9,fn,Pn) - ^n{9,fln,Pn) = -B{an " Oq) + Op* (f^''^') ■ 

Replacing (6.22), (6.23) and (6.24) by (6.29), (6.26) and (6.27), respectively, 
we obtain 

n^/\9*„ - 9o)=n^/\9n - 9o) + {^e{9o,Vo, Po)}'^ Bn^/\an - ao) + Op*(l). 
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By (3.3) we know that 6n is an asymptotically linear estimator. Given that 
d„ is also an asymptotically linear estimator, we know that ri^/^(^„ — ^o) and 
n^/^(a„ — oq) are asymptotically jointly normal by the multivariate central 
limit theorem. Hence, by [18], we know that n^/^(^* — ^o) is asymptotically 
normal with variance given in (3.7). 
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